Image name

Coniguring the Python Setup

The first thing you will need to do is configure the Python setup for reticulate. This comes packaged in a miniconda format with R and you will need to make sure that you use it with the correct version of Anaconda. When you install.packages(“reticulate”) it creates a miniconda installation for you.

conda_list()
##                    name
## 1           r-miniconda
## 2          r-reticulate
## 3 r-reticulate-gary-env
##                                                                                   python
## 1                              C:\\Users\\garyh\\AppData\\Local\\r-miniconda\\python.exe
## 2          C:\\Users\\garyh\\AppData\\Local\\r-miniconda\\envs\\r-reticulate\\python.exe
## 3 C:\\Users\\garyh\\AppData\\Local\\r-miniconda\\envs\\r-reticulate-gary-env\\python.exe
#use_condaenv("anaconda3")

The approach I prefer to use is creating my own conda environment to store all the relevant packages and supporting information I need.

Creating your own conda environment

To create your own environment it is as simple as passing a new environment name, as structured below:

my_env <- "r-reticulate-gary-env"
conda_create(my_env)
## [1] "C:\\Users\\garyh\\AppData\\Local\\r-miniconda\\envs\\r-reticulate-gary-env\\python.exe"

The next step would be to install the relevant Python packages into the R environment. The reason we want to use reticulate is to access all the cool packages that we do not have access to in the native R.

Installing Python packages

The next step of the process is to install the relevant packages that you may reequire to work with in R:

py_install("pandas",envname = my_env) #Python's data frame library
py_install("numpy", envname = my_env) #Python's array library
py_install("seaborn", envname = my_env) #Python's visualisation library
py_install("scikit-learn",envname = my_env) #Python's Machine Learning library
py_install("matplotlib", envname = my_env) #Python's core visualisation library

The next step is to then use the Python new environment we have just created to work with reticulate and R together.

use_condaenv(my_env)
conda_version()
## [1] "conda 4.9.0"
conda_list()
##                    name
## 1           r-miniconda
## 2          r-reticulate
## 3 r-reticulate-gary-env
##                                                                                   python
## 1                              C:\\Users\\garyh\\AppData\\Local\\r-miniconda\\python.exe
## 2          C:\\Users\\garyh\\AppData\\Local\\r-miniconda\\envs\\r-reticulate\\python.exe
## 3 C:\\Users\\garyh\\AppData\\Local\\r-miniconda\\envs\\r-reticulate-gary-env\\python.exe

Importing Python packages in reticulate style

For Python users, this bit will look a bit unfamiliar, as we are used to declaring imports this way from mypackage import submodules. Reticulate in R needs these to be stored as R objects and each type set as variables:

#Create python objects as R
numpy <- import("numpy")
## Warning: Python 'C:\Users\garyh\AppData\Local\r-miniconda\envs\r-
## reticulate-gary-env\python.exe' was requested but 'C:/Users/garyh/AppData/
## Local/r-miniconda/envs/r-reticulate/python.exe' was loaded instead (see
## reticulate::py_config() for more information)
pandas <- import("pandas")

# Import libraries for ski-kit learn
sl_model_selection <- import("sklearn.model_selection")
skl <- import("sklearn")
skl_ensemble <- import("sklearn.ensemble")
skl_pipeline <- import("sklearn.pipeline")
skl_metrics <- import("sklearn.metrics")
skl_externals <- import("sklearn.externals")
skl_lm <- import("sklearn.linear_model")

# Import visualisation libraries
sns <- import('seaborn')
plt <- import('matplotlib.pyplot')

The setup has been completed. The next section will look at some basics, before jumping into how to use R and Python together to pass a data frame from R, to the Python ML packages, do some Python visuals, pass back to R and then back to an external Python file again.

Functions from Python in Reticulate

Functions in Python start with def() and to utilise a Python function in R you need to follow the below steps:

py_run_string("def square_root(x):
                value = x * 0.5
                return(value)")

At first you will think, this did absolutely nothing, but it is hidden at the moment. To access Python objects you then need to use the function, as hereunder:

py$square_root(10)
## [1] 5

The py command will show you the list of Python objects that have been made available to R. Now I can access my custom square root function and pass a value to it, this is my preferred way. Another way this can be achieved is in an eval type statement:

py_eval("square_root(10)")
## [1] 5

Modelling with Python and R - with the help of reticulate

The first step is to do some data preparation and wrangling to get the data into the right format. We are going to make this a regression task and I am going to try and predict the temperature based on some other collected variables.

Data Setup

I am now going to set up the data and use my custom function to upsize the data:

ttbs <- read_csv("Data/TTBS_Prediction.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   Age = col_double(),
##   EDPres30days = col_double(),
##   AssesTimeFirst_mins = col_double(),
##   TTBS_mins = col_double()
## )
ttbs %<>% 
  sample_frac(size=0.2)

The data is now ready, has been upsized, the relevant fields selected and nulls removed from the data frame.

Splitting data

Next, I split the data into predictors(features) and predicted:

# X and Y predictions
X <- ttbs[,1:3]
Y <- data.frame(ttbs[,4])

Casting to a Python object

The important command to use here is the r_to_py() command to convert the data frame, or R object, into the associate Python Panda’s data frame, or numpy array, etc. R handles this conversion for you.

I will cast the air data frame, the X and Y splits over to Python to use the train test split functionality in Python.

py_ttbs <- r_to_py(ttbs)
py_X <- r_to_py(X)
py_Y <- r_to_py(Y)
py_ttbs$head() # Call the head on the Python object
##     Age  EDPres30days  AssesTimeFirst_mins  TTBS_mins
## 0  64.0           1.0                 46.0      356.0
## 1  76.0           1.0                 97.0      416.0
## 2  28.0           0.0                 63.0      330.0
## 3  37.0           0.0                 41.0      360.0
## 4  18.0           1.0                115.0      266.0
py_ttbs$dtypes # Python data types method
## Age                    float64
## EDPres30days           float64
## AssesTimeFirst_mins    float64
## TTBS_mins              float64
## dtype: object
py_ttbs$nunique #Python number of unique items method
## <bound method DataFrame.nunique of        Age  EDPres30days  AssesTimeFirst_mins  TTBS_mins
## 0     64.0           1.0                 46.0      356.0
## 1     76.0           1.0                 97.0      416.0
## 2     28.0           0.0                 63.0      330.0
## 3     37.0           0.0                 41.0      360.0
## 4     18.0           1.0                115.0      266.0
## ...    ...           ...                  ...        ...
## 1105  49.0           0.0                 92.0      364.0
## 1106  18.0           1.0                138.0      287.0
## 1107  69.0           0.0                201.0      261.0
## 1108  24.0           1.0                 97.0      313.0
## 1109  67.0           1.0                 97.0      296.0
## 
## [1110 rows x 4 columns]>
py_ttbs$describe() #Python describe method, same as summary in R
##                Age  EDPres30days  AssesTimeFirst_mins    TTBS_mins
## count  1110.000000   1110.000000          1110.000000  1110.000000
## mean     44.829730      0.498198            99.492793   331.869369
## std      21.591143      0.500222            35.315344    40.722798
## min      18.000000      0.000000            23.000000   244.000000
## 25%      24.000000      0.000000            74.000000   301.000000
## 50%      40.000000      0.000000            97.000000   334.000000
## 75%      62.750000      1.000000           115.000000   360.000000
## max      97.000000      1.000000           207.000000   416.000000
py_list_attributes(py_ttbs) #Generate attribute list
##   [1] "Age"                                  
##   [2] "AssesTimeFirst_mins"                  
##   [3] "EDPres30days"                         
##   [4] "T"                                    
##   [5] "TTBS_mins"                            
##   [6] "_AXIS_LEN"                            
##   [7] "_AXIS_NAMES"                          
##   [8] "_AXIS_NUMBERS"                        
##   [9] "_AXIS_ORDERS"                         
##  [10] "_AXIS_REVERSED"                       
##  [11] "_AXIS_TO_AXIS_NUMBER"                 
##  [12] "__abs__"                              
##  [13] "__add__"                              
##  [14] "__and__"                              
##  [15] "__annotations__"                      
##  [16] "__array__"                            
##  [17] "__array_priority__"                   
##  [18] "__array_wrap__"                       
##  [19] "__bool__"                             
##  [20] "__class__"                            
##  [21] "__contains__"                         
##  [22] "__copy__"                             
##  [23] "__deepcopy__"                         
##  [24] "__delattr__"                          
##  [25] "__delitem__"                          
##  [26] "__dict__"                             
##  [27] "__dir__"                              
##  [28] "__div__"                              
##  [29] "__doc__"                              
##  [30] "__eq__"                               
##  [31] "__finalize__"                         
##  [32] "__floordiv__"                         
##  [33] "__format__"                           
##  [34] "__ge__"                               
##  [35] "__getattr__"                          
##  [36] "__getattribute__"                     
##  [37] "__getitem__"                          
##  [38] "__getstate__"                         
##  [39] "__gt__"                               
##  [40] "__hash__"                             
##  [41] "__iadd__"                             
##  [42] "__iand__"                             
##  [43] "__ifloordiv__"                        
##  [44] "__imod__"                             
##  [45] "__imul__"                             
##  [46] "__init__"                             
##  [47] "__init_subclass__"                    
##  [48] "__invert__"                           
##  [49] "__ior__"                              
##  [50] "__ipow__"                             
##  [51] "__isub__"                             
##  [52] "__iter__"                             
##  [53] "__itruediv__"                         
##  [54] "__ixor__"                             
##  [55] "__le__"                               
##  [56] "__len__"                              
##  [57] "__lt__"                               
##  [58] "__matmul__"                           
##  [59] "__mod__"                              
##  [60] "__module__"                           
##  [61] "__mul__"                              
##  [62] "__ne__"                               
##  [63] "__neg__"                              
##  [64] "__new__"                              
##  [65] "__nonzero__"                          
##  [66] "__or__"                               
##  [67] "__pos__"                              
##  [68] "__pow__"                              
##  [69] "__radd__"                             
##  [70] "__rand__"                             
##  [71] "__rdiv__"                             
##  [72] "__reduce__"                           
##  [73] "__reduce_ex__"                        
##  [74] "__repr__"                             
##  [75] "__rfloordiv__"                        
##  [76] "__rmatmul__"                          
##  [77] "__rmod__"                             
##  [78] "__rmul__"                             
##  [79] "__ror__"                              
##  [80] "__round__"                            
##  [81] "__rpow__"                             
##  [82] "__rsub__"                             
##  [83] "__rtruediv__"                         
##  [84] "__rxor__"                             
##  [85] "__setattr__"                          
##  [86] "__setitem__"                          
##  [87] "__setstate__"                         
##  [88] "__sizeof__"                           
##  [89] "__str__"                              
##  [90] "__sub__"                              
##  [91] "__subclasshook__"                     
##  [92] "__truediv__"                          
##  [93] "__weakref__"                          
##  [94] "__xor__"                              
##  [95] "_accessors"                           
##  [96] "_add_numeric_operations"              
##  [97] "_add_series_or_dataframe_operations"  
##  [98] "_agg_by_level"                        
##  [99] "_agg_examples_doc"                    
## [100] "_agg_summary_and_see_also_doc"        
## [101] "_aggregate"                           
## [102] "_aggregate_multiple_funcs"            
## [103] "_align_frame"                         
## [104] "_align_series"                        
## [105] "_box_col_values"                      
## [106] "_builtin_table"                       
## [107] "_can_fast_transpose"                  
## [108] "_check_inplace_setting"               
## [109] "_check_is_chained_assignment_possible"
## [110] "_check_label_or_level_ambiguity"      
## [111] "_check_setitem_copy"                  
## [112] "_clear_item_cache"                    
## [113] "_clip_with_one_bound"                 
## [114] "_clip_with_scalar"                    
## [115] "_combine_frame"                       
## [116] "_consolidate"                         
## [117] "_consolidate_inplace"                 
## [118] "_construct_axes_dict"                 
## [119] "_construct_axes_from_arguments"       
## [120] "_construct_result"                    
## [121] "_constructor"                         
## [122] "_constructor_expanddim"               
## [123] "_constructor_sliced"                  
## [124] "_convert"                             
## [125] "_count_level"                         
## [126] "_cython_table"                        
## [127] "_data"                                
## [128] "_deprecations"                        
## [129] "_dir_additions"                       
## [130] "_dir_deletions"                       
## [131] "_drop_axis"                           
## [132] "_drop_labels_or_levels"               
## [133] "_ensure_valid_index"                  
## [134] "_find_valid_index"                    
## [135] "_from_arrays"                         
## [136] "_get_agg_axis"                        
## [137] "_get_axis"                            
## [138] "_get_axis_name"                       
## [139] "_get_axis_number"                     
## [140] "_get_axis_resolvers"                  
## [141] "_get_block_manager_axis"              
## [142] "_get_bool_data"                       
## [143] "_get_cacher"                          
## [144] "_get_cleaned_column_resolvers"        
## [145] "_get_column_array"                    
## [146] "_get_cython_func"                     
## [147] "_get_index_resolvers"                 
## [148] "_get_item_cache"                      
## [149] "_get_label_or_level_values"           
## [150] "_get_numeric_data"                    
## [151] "_get_value"                           
## [152] "_getitem_bool_array"                  
## [153] "_getitem_multilevel"                  
## [154] "_gotitem"                             
## [155] "_indexed_same"                        
## [156] "_info_axis"                           
## [157] "_info_axis_name"                      
## [158] "_info_axis_number"                    
## [159] "_info_repr"                           
## [160] "_init_mgr"                            
## [161] "_internal_names"                      
## [162] "_internal_names_set"                  
## [163] "_is_builtin_func"                     
## [164] "_is_cached"                           
## [165] "_is_copy"                             
## [166] "_is_homogeneous_type"                 
## [167] "_is_label_or_level_reference"         
## [168] "_is_label_reference"                  
## [169] "_is_level_reference"                  
## [170] "_is_mixed_type"                       
## [171] "_is_view"                             
## [172] "_iset_item"                           
## [173] "_iter_column_arrays"                  
## [174] "_ix"                                  
## [175] "_ixs"                                 
## [176] "_join_compat"                         
## [177] "_maybe_cache_changed"                 
## [178] "_maybe_update_cacher"                 
## [179] "_metadata"                            
## [180] "_needs_reindex_multi"                 
## [181] "_obj_with_exclusions"                 
## [182] "_protect_consolidate"                 
## [183] "_reduce"                              
## [184] "_reindex_axes"                        
## [185] "_reindex_columns"                     
## [186] "_reindex_index"                       
## [187] "_reindex_multi"                       
## [188] "_reindex_with_indexers"               
## [189] "_replace_columnwise"                  
## [190] "_repr_data_resource_"                 
## [191] "_repr_fits_horizontal_"               
## [192] "_repr_fits_vertical_"                 
## [193] "_repr_html_"                          
## [194] "_repr_latex_"                         
## [195] "_reset_cache"                         
## [196] "_reset_cacher"                        
## [197] "_sanitize_column"                     
## [198] "_selected_obj"                        
## [199] "_selection"                           
## [200] "_selection_list"                      
## [201] "_selection_name"                      
## [202] "_series"                              
## [203] "_set_as_cached"                       
## [204] "_set_axis"                            
## [205] "_set_axis_name"                       
## [206] "_set_is_copy"                         
## [207] "_set_item"                            
## [208] "_set_value"                           
## [209] "_setitem_array"                       
## [210] "_setitem_frame"                       
## [211] "_setitem_slice"                       
## [212] "_slice"                               
## [213] "_stat_axis"                           
## [214] "_stat_axis_name"                      
## [215] "_stat_axis_number"                    
## [216] "_take_with_is_copy"                   
## [217] "_to_dict_of_blocks"                   
## [218] "_try_aggregate_string_function"       
## [219] "_typ"                                 
## [220] "_update_inplace"                      
## [221] "_validate_dtype"                      
## [222] "_values"                              
## [223] "_where"                               
## [224] "abs"                                  
## [225] "add"                                  
## [226] "add_prefix"                           
## [227] "add_suffix"                           
## [228] "agg"                                  
## [229] "aggregate"                            
## [230] "align"                                
## [231] "all"                                  
## [232] "any"                                  
## [233] "append"                               
## [234] "apply"                                
## [235] "applymap"                             
## [236] "asfreq"                               
## [237] "asof"                                 
## [238] "assign"                               
## [239] "astype"                               
## [240] "at"                                   
## [241] "at_time"                              
## [242] "attrs"                                
## [243] "axes"                                 
## [244] "backfill"                             
## [245] "between_time"                         
## [246] "bfill"                                
## [247] "bool"                                 
## [248] "boxplot"                              
## [249] "clip"                                 
## [250] "columns"                              
## [251] "combine"                              
## [252] "combine_first"                        
## [253] "compare"                              
## [254] "convert_dtypes"                       
## [255] "copy"                                 
## [256] "corr"                                 
## [257] "corrwith"                             
## [258] "count"                                
## [259] "cov"                                  
## [260] "cummax"                               
## [261] "cummin"                               
## [262] "cumprod"                              
## [263] "cumsum"                               
## [264] "describe"                             
## [265] "diff"                                 
## [266] "div"                                  
## [267] "divide"                               
## [268] "dot"                                  
## [269] "drop"                                 
## [270] "drop_duplicates"                      
## [271] "droplevel"                            
## [272] "dropna"                               
## [273] "dtypes"                               
## [274] "duplicated"                           
## [275] "empty"                                
## [276] "eq"                                   
## [277] "equals"                               
## [278] "eval"                                 
## [279] "ewm"                                  
## [280] "expanding"                            
## [281] "explode"                              
## [282] "ffill"                                
## [283] "fillna"                               
## [284] "filter"                               
## [285] "first"                                
## [286] "first_valid_index"                    
## [287] "floordiv"                             
## [288] "from_dict"                            
## [289] "from_records"                         
## [290] "ge"                                   
## [291] "get"                                  
## [292] "groupby"                              
## [293] "gt"                                   
## [294] "head"                                 
## [295] "hist"                                 
## [296] "iat"                                  
## [297] "idxmax"                               
## [298] "idxmin"                               
## [299] "iloc"                                 
## [300] "index"                                
## [301] "infer_objects"                        
## [302] "info"                                 
## [303] "insert"                               
## [304] "interpolate"                          
## [305] "isin"                                 
## [306] "isna"                                 
## [307] "isnull"                               
## [308] "items"                                
## [309] "iteritems"                            
## [310] "iterrows"                             
## [311] "itertuples"                           
## [312] "join"                                 
## [313] "keys"                                 
## [314] "kurt"                                 
## [315] "kurtosis"                             
## [316] "last"                                 
## [317] "last_valid_index"                     
## [318] "le"                                   
## [319] "loc"                                  
## [320] "lookup"                               
## [321] "lt"                                   
## [322] "mad"                                  
## [323] "mask"                                 
## [324] "max"                                  
## [325] "mean"                                 
## [326] "median"                               
## [327] "melt"                                 
## [328] "memory_usage"                         
## [329] "merge"                                
## [330] "min"                                  
## [331] "mod"                                  
## [332] "mode"                                 
## [333] "mul"                                  
## [334] "multiply"                             
## [335] "ndim"                                 
## [336] "ne"                                   
## [337] "nlargest"                             
## [338] "notna"                                
## [339] "notnull"                              
## [340] "nsmallest"                            
## [341] "nunique"                              
## [342] "pad"                                  
## [343] "pct_change"                           
## [344] "pipe"                                 
## [345] "pivot"                                
## [346] "pivot_table"                          
## [347] "plot"                                 
## [348] "pop"                                  
## [349] "pow"                                  
## [350] "prod"                                 
## [351] "product"                              
## [352] "quantile"                             
## [353] "query"                                
## [354] "radd"                                 
## [355] "rank"                                 
## [356] "rdiv"                                 
## [357] "reindex"                              
## [358] "reindex_like"                         
## [359] "rename"                               
## [360] "rename_axis"                          
## [361] "reorder_levels"                       
## [362] "replace"                              
## [363] "resample"                             
## [364] "reset_index"                          
## [365] "rfloordiv"                            
## [366] "rmod"                                 
## [367] "rmul"                                 
## [368] "rolling"                              
## [369] "round"                                
## [370] "rpow"                                 
## [371] "rsub"                                 
## [372] "rtruediv"                             
## [373] "sample"                               
## [374] "select_dtypes"                        
## [375] "sem"                                  
## [376] "set_axis"                             
## [377] "set_index"                            
## [378] "shape"                                
## [379] "shift"                                
## [380] "size"                                 
## [381] "skew"                                 
## [382] "slice_shift"                          
## [383] "sort_index"                           
## [384] "sort_values"                          
## [385] "squeeze"                              
## [386] "stack"                                
## [387] "std"                                  
## [388] "style"                                
## [389] "sub"                                  
## [390] "subtract"                             
## [391] "sum"                                  
## [392] "swapaxes"                             
## [393] "swaplevel"                            
## [394] "tail"                                 
## [395] "take"                                 
## [396] "to_clipboard"                         
## [397] "to_csv"                               
## [398] "to_dict"                              
## [399] "to_excel"                             
## [400] "to_feather"                           
## [401] "to_gbq"                               
## [402] "to_hdf"                               
## [403] "to_html"                              
## [404] "to_json"                              
## [405] "to_latex"                             
## [406] "to_markdown"                          
## [407] "to_numpy"                             
## [408] "to_parquet"                           
## [409] "to_period"                            
## [410] "to_pickle"                            
## [411] "to_records"                           
## [412] "to_sql"                               
## [413] "to_stata"                             
## [414] "to_string"                            
## [415] "to_timestamp"                         
## [416] "to_xarray"                            
## [417] "transform"                            
## [418] "transpose"                            
## [419] "truediv"                              
## [420] "truncate"                             
## [421] "tz_convert"                           
## [422] "tz_localize"                          
## [423] "unstack"                              
## [424] "update"                               
## [425] "value_counts"                         
## [426] "values"                               
## [427] "var"                                  
## [428] "where"                                
## [429] "xs"
py_len(py_ttbs) #Get the length of the dataset
## [1] 1110

Using Python’s train and test split

I will now use sklearn’s train_test_split function to split my data to sample into training and test splits, for utilisation with sklearn later on:

split <- sl_model_selection$train_test_split(X, Y, test_size=0.75)
#Tap into the model_selection sub module in sklearn to get train_test_split function

This will return a list of elements, as this is how it is held as a tuple in Python. Python is cool as it allows for multiple assignment, but R does not have that capability so I have to index select the relevant data frames stored in a list:

py_X_train <- r_to_py(split[[2]])
py_X_test <- r_to_py(split[[1]])
py_Y_train <- r_to_py(split[[4]])
py_Y_test <- r_to_py(split[[3]])

py_X_train$head() #Use head method in Python
##       Age  EDPres30days  AssesTimeFirst_mins
## 135  46.0           1.0                 69.0
## 706  64.0           0.0                 41.0
## 119  36.0           1.0                103.0
## 363  24.0           1.0                 97.0
## 229  61.0           0.0                 63.0

Fitting a model in Sci-kit learn (Python’s ML library)

The next steps fit a linear regression model in sci-kit learn. Unlike R, Sci-kit learn requires you to instantiate the model object before fitting. The code below shows the process:

sk_lm_model <- skl_lm$LinearRegression() #Instantiate the linear regression method
model <- sk_lm_model$fit(py_X_train, py_Y_train) #Fit the model object to the training set - Python takes its inputs in as separate numpy arrays
r_squared <- model$score(py_X_test, py_Y_test) #The model score, for this model, is the r squared value indicating how well the chosen predictors fit the temperature we are trying to predict

To access the model results we use the following code - this will bring back the intercept terms and the coefficients:

model_intercept <- model$intercept_
model_coef <- model$coef_
print(model_intercept)
## [1] 330.6488
print(model_coef)
##           [,1]      [,2]       [,3]
## [1,] 0.8153936 0.9064243 -0.3552615

Making predictions with the model

To make predictions with the model we will use the testing set that we created when we used the sci-kit learn splitting function. This will allow us to validate the model fit visually:

model_predict <- model$predict(py_X_test)
#Create a data frame with the predictions
model_results <- data.frame(Predicted_Temp=model_predict, 
                            py_to_r(py_Y_test),
                            py_to_r(py_Y_test) - model_predict)
colnames(model_results) <- c("Predicted", "Actual", "Residual")

The model predict converts to an R object, however the py_Y_test is still in a native Python format, so I need to use the reverse casting function py_to_r() to convert it back to an object that R can work with. If I tried to pass this directly without the conversion, then I would get an exception error.

The model_results creates a data frame and then I use the colnames() R function to change the names of the columns in the data frame, these names have been passed to a R vector.

Visualising the fit with Seaborn

I will now convert my model_results frame back to a Python format (a Pandas data frame) to allow seaborn to interact with the columns and rows in the df.

# Convert model results back to Python to do stuff with
py_mod_results <- r_to_py(model_results)
py_mod_results$dtypes
## Predicted    float64
## Actual       float64
## Residual     float64
## dtype: object

This will print out the data types of the Python object. In Python, this code would look like this py_mod_results.dtypes, the dollar ($) notation would be replaced with a period (.).

Finally, we will pass this visual through to Seaborn to do something with:

#Create line plot in seaborn
sns$lineplot(data=py_mod_results, x="Actual", y="Predicted")
## AxesSubplot(0.125,0.11;0.775x0.77)
plt$savefig("Images/seaborn.png")
knitr::include_graphics("Images/seaborn.png")

The plot returned is a Python plot, this varies slightly from the R code, as I could use plt$show() directly after the code to view the chart, however this opens in Python and then cannot be integrated into the Markdown book.

Create the same plot in R

I will now create a similar plot in R:

plot <- model_results %>% 
  ggplot(aes(x=Actual, 
             y=Predicted)) + geom_point(color="blue") +
  geom_smooth(method = 'lm', formula = "y ~ x") 

plotly::ggplotly(plot) # Convert to a plotly object

Running an external python script

First, we need to write out the results from our R environment. I will use data.table to write this out quickly:

ttbs_reduced <- ttbs %>% 
  dplyr::select(-EDPres30days)
# Get rid of the ED presentation within last 30 days due to zero variance
data.table::fwrite(ttbs_reduced, "Data/ttbs.csv")

The below example shows how to run an external Python script. This Python script picks up the data from the air data frame and creates an sns pairplot:

# Here we have two plots we will now use to pass python objects through to
py_run_file("sns_plot.py") #This has a call to pick up the data and a function 
# to create a pair plot
plt$savefig("Images/snspairplot.png")
knitr::include_graphics("Images/snspairplot.png")

This ran the external python script, returned the chart object, I saved this (as R Markdown cannot view matplotlib plots) and then I load this back in to display.

Creating a correlation matrix with Python’s heatmap

The final example, I will demonstrate how to create a heatmap in Python:

# Finally we will create a correlation matrix in matplot lib 

corr <- py_ttbs$corr()
plt$clf()#Get rid of previous figure
sns$heatmap(corr, annot=TRUE, cmap="YlGnBu")
## AxesSubplot(0.0662639,0.0777257;0.726528x0.881274)
plt$savefig("Images/correlation_plot.png")
knitr::include_graphics("Images/correlation_plot.png")

There is more you can do with reticulate, like combining with S3 methods, but for the purposes of passing structures back and forward I find the approach I use is the best method

Find out more

If you want to find the code, click the Github image below, the repositories will also be listed when the webinar is posted by the NHS-R community.